Machine learning methods for transcription data integration
نویسندگان
چکیده
Gene expression is modulated by transcription factors (TFs), proteins that generally bind adjacent to DNA coding regions. The relation between these regulatory proteins and their targets is generally many to many; each target can be regulated by more than one TF, and each TF can contribute to the regulation of more than one target. Among the consequences of combinatorial regulation is the emergence of control networks, which can regulate nuanced changes in phenotype in response to subtle environmental changes. A first step in developing a complete molecular understanding of transcriptional regulation is to associate each TF with the set(s) of genes that it regulates. In this paper we report on the ability of support vector machines to associate 104 TFs with their binding sites. Several types of data are used to train classifiers for TF binding prediction in the Saccharomyces cerevisiae genome: position-specific scoring matrix (PSSM) matches, conservation of PSSM matches in other genomes, gene expression, phylogenetic profiles, Gene Ontology functional annotation, TF-target expression correlation, and promoter sequence composition using various subsequence features. The classifiers are combined using a weighting scheme so that each type of genomic data can contribute to the classification of a binding site based on its performance. In each case the data are compared to known true positives taken from ChIP-chip data , Transfac, and the Yeast Proteome Database. The SVM works best when all genomic data are combined, and it also rank orders data types by performance. On average, we can classify binding sites with a sensitivity of 73% and a positive predictive value of almost 0.9. New ideas and preliminary work for improving SVM classification on biological data are also discussed.
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملA Hybrid Machine Learning Method for Intrusion Detection
Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...
متن کاملThe machine learning process in applying spatial relations of residential plans based on samples and adjacency matrix
The current world is moving towards the development of hardware or software presence of artificial intelligence in all fields of human work, and architecture is no exception. Now this research seeks to present a theoretical and practical model of intuitive design intelligence that shows the problem of learning layout and spatial relationships to artificial intelligence algorithms; Therefore, th...
متن کاملEvaluating machine learning methods and satellite images to estimate combined climatic indices
The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...
متن کاملForecasting the Tehran Stock market by Machine Learning Methods using a New Loss Function
Stock market forecasting has attracted so many researchers and investors that many studies have been done in this field. These studies have led to the development of many predictive methods, the most widely used of which are machine learning-based methods. In machine learning-based methods, loss function has a key role in determining the model weights. In this study a new loss function is ...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IBM Journal of Research and Development
دوره 50 شماره
صفحات -
تاریخ انتشار 2006